Goto

Collaborating Authors

 orientation estimation


ProbabilisticOrientationEstimationwithMatrix FisherDistributions

Neural Information Processing Systems

This paper focuses on estimating probability distributions over the set of 3D rotations (SO(3)) using deep neural networks. Learning to regress models to the set of rotations is inherently difficult due to differences in topology between RN and SO(3). We overcome this issue by using a neural network to output the parameters for a matrix Fisher distribution since these parameters are homeomorphic toR9. By using a negative log likelihood loss for this distribution we get a loss which is convex with respect to the network outputs. By optimizing this loss we improve state-of-the-art on several challenging applicable datasets, namely Pascal3D+, ModelNet10-SO(3).



Orient Anything

Scarvelis, Christopher, Benhaim, David, Zhang, Paul

arXiv.org Artificial Intelligence

Orientation estimation is a fundamental task in 3D shape analysis which consists of estimating a shape's orientation axes: its side-, up-, and front-axes. Using this data, one can rotate a shape into canonical orientation, where its orientation axes are aligned with the coordinate axes. Developing an orientation algorithm that reliably estimates complete orientations of general shapes remains an open problem. We introduce a two-stage orientation pipeline that achieves state of the art performance on up-axis estimation and further demonstrate its efficacy on fullorientation estimation, where one seeks all three orientation axes. Unlike previous work, we train and evaluate our method on all of Shapenet rather than a subset of classes. We motivate our engineering contributions by theory describing fundamental obstacles to orientation estimation for rotationally-symmetric shapes, and show how our method avoids these obstacles. Orientation estimation is a fundamental task in 3D shape analysis which consists of estimating a shape's orientation axes: its side-, up-, and front-axes. Using this data, one can rotate a shape into canonical orientation, in which the shape's orientation axes are aligned with the coordinate axes.


Fast Decentralized State Estimation for Legged Robot Locomotion via EKF and MHE

Kang, Jiarong, Wang, Yi, Xiong, Xiaobin

arXiv.org Artificial Intelligence

In this paper, we present a fast and decentralized state estimation framework for the control of legged locomotion. The nonlinear estimation of the floating base states is decentralized to an orientation estimation via Extended Kalman Filter (EKF) and a linear velocity estimation via Moving Horizon Estimation (MHE). The EKF fuses the inertia sensor with vision to estimate the floating base orientation. The MHE uses the estimated orientation with all the sensors within a time window in the past to estimate the linear velocities based on a time-varying linear dynamics formulation of the interested states with state constraints. More importantly, a marginalization method based on the optimization structure of the full information filter (FIF) is proposed to convert the equality-constrained FIF to an equivalent MHE. This decoupling of state estimation promotes the desired balance of computation efficiency, accuracy of estimation, and the inclusion of state constraints. The proposed method is shown to be capable of providing accurate state estimation to several legged robots, including the highly dynamic hopping robot PogoX, the bipedal robot Cassie, and the quadrupedal robot Unitree Go1, with a frequency at 200 Hz and a window interval of 0.1s.


Human Orientation Estimation under Partial Observation

Zhao, Jieting, Ye, Hanjing, Zhan, Yu, Zhang, Hong

arXiv.org Artificial Intelligence

Reliable human orientation estimation (HOE) is critical for autonomous agents to understand human intention and perform human-robot interaction (HRI) tasks. Great progress has been made in HOE under full observation. However, the existing methods easily make a wrong prediction under partial observation and give it an unexpectedly high probability. To solve the above problems, this study first develops a method that estimates orientation from the visible joints of a target person so that it is able to handle partial observation. Subsequently, we introduce a confidence-aware orientation estimation method, enabling more accurate orientation estimation and reasonable confidence estimation under partial observation. The effectiveness of our method is validated on both public and custom-built datasets, and it showed great accuracy and reliability improvement in partial observation scenarios. In particular, we show in real experiments that our method can benefit the robustness and consistency of the robot person following (RPF) task.


Shape Sensing for Continuum Robotics using Optoelectronic Sensors with Convex Reflectors

Osman, Dalia, Du, Xinli, Minton, Timothy, Noh, Yohan

arXiv.org Artificial Intelligence

Three-dimensional shape sensing in soft and continuum robotics is a crucial aspect for stable actuation and control in fields such as Minimally Invasive surgery, as the estimation of complex curvatures while using continuum robotic tools is required to manipulate through fragile paths. This challenge has been addressed using a range of different sensing techniques, for example, Fibre Bragg grating (FBG) technology, inertial measurement unit (IMU) sensor networks or stretch sensors. Previously, an optics-based method, using optoelectronic sensors was explored, offering a simple and cost-effective solution for shape sensing in a flexible tendon-actuated manipulator in two orientations. This was based on proximity-modulated angle estimation and has been the basis for the shape-sensing method addressed in this paper. The improved and miniaturized technique demonstrated in this paper is based on the use of a spherically shaped reflector with optoelectronic sensors integrated into a tendon actuated robotic manipulator. Upgraded sensing capability is achieved using optimization of the spherical reflector shape in terms of sensor range and resolution, and improved calibration is achieved through the integration of spherical bearings for friction-free motion. Shape estimation is achieved in two orientations upon calibration of sensors, with a maximum Root Mean Square Error (RMS) of 3.37{\deg}.


A Feasibility Study on Indoor Localization and Multi-person Tracking Using Sparsely Distributed Camera Network with Edge Computing

Kwon, Hyeokhyen, Hegde, Chaitra, Kiarashi, Yashar, Madala, Venkata Siva Krishna, Singh, Ratan, Nakum, ArjunSinh, Tweedy, Robert, Tonetto, Leandro Miletto, Zimring, Craig M., Doiron, Matthew, Rodriguez, Amy D., Levey, Allan I., Clifford, Gari D.

arXiv.org Artificial Intelligence

Camera-based activity monitoring systems are becoming an attractive solution for smart building applications with the advances in computer vision and edge computing technologies. In this paper, we present a feasibility study and systematic analysis of a camera-based indoor localization and multi-person tracking system implemented on edge computing devices within a large indoor space. To this end, we deployed an end-to-end edge computing pipeline that utilizes multiple cameras to achieve localization, body orientation estimation and tracking of multiple individuals within a large therapeutic space spanning $1700m^2$, all while maintaining a strong focus on preserving privacy. Our pipeline consists of 39 edge computing camera systems equipped with Tensor Processing Units (TPUs) placed in the indoor space's ceiling. To ensure the privacy of individuals, a real-time multi-person pose estimation algorithm runs on the TPU of the computing camera system. This algorithm extracts poses and bounding boxes, which are utilized for indoor localization, body orientation estimation, and multi-person tracking. Our pipeline demonstrated an average localization error of 1.41 meters, a multiple-object tracking accuracy score of 88.6\%, and a mean absolute body orientation error of 29\degree. These results shows that localization and tracking of individuals in a large indoor space is feasible even with the privacy constrains.


Sim2Real Grasp Pose Estimation for Adaptive Robotic Applications

Horváth, Dániel, Bocsi, Kristóf, Erdős, Gábor, Istenes, Zoltán

arXiv.org Artificial Intelligence

Adaptive robotics plays an essential role in achieving truly co-creative cyber physical systems. In robotic manipulation tasks, one of the biggest challenges is to estimate the pose of given workpieces. Even though the recent deep-learning-based models show promising results, they require an immense dataset for training. In this paper, two vision-based, multi-object grasp pose estimation models (MOGPE), the MOGPE Real-Time and the MOGPE High-Precision are proposed. Furthermore, a sim2real method based on domain randomization to diminish the reality gap and overcome the data shortage. Our methods yielded an 80% and a 96.67% success rate in a real-world robotic pick-and-place experiment, with the MOGPE Real-Time and the MOGPE High-Precision model respectively. Our framework provides an industrial tool for fast data generation and model training and requires minimal domain-specific data.


Efficient Multi-Task Scene Analysis with RGB-D Transformers

Fischedick, Söhnke Benedikt, Seichter, Daniel, Schmidt, Robin, Rabes, Leonard, Gross, Horst-Michael

arXiv.org Artificial Intelligence

Scene analysis is essential for enabling autonomous systems, such as mobile robots, to operate in real-world environments. However, obtaining a comprehensive understanding of the scene requires solving multiple tasks, such as panoptic segmentation, instance orientation estimation, and scene classification. Solving these tasks given limited computing and battery capabilities on mobile platforms is challenging. To address this challenge, we introduce an efficient multi-task scene analysis approach, called EMSAFormer, that uses an RGB-D Transformer-based encoder to simultaneously perform the aforementioned tasks. Our approach builds upon the previously published EMSANet. However, we show that the dual CNN-based encoder of EMSANet can be replaced with a single Transformer-based encoder. To achieve this, we investigate how information from both RGB and depth data can be effectively incorporated in a single encoder. To accelerate inference on robotic hardware, we provide a custom NVIDIA TensorRT extension enabling highly optimization for our EMSAFormer approach. Through extensive experiments on the commonly used indoor datasets NYUv2, SUNRGB-D, and ScanNet, we show that our approach achieves state-of-the-art performance while still enabling inference with up to 39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.


One RING to Rule Them All: Radon Sinogram for Place Recognition, Orientation and Translation Estimation

Lu, Sha, Xu, Xuecheng, Yin, Huan, Chen, Zexi, Xiong, Rong, Wang, Yue

arXiv.org Artificial Intelligence

LiDAR-based global localization is a fundamental problem for mobile robots. It consists of two stages, place recognition and pose estimation, which yields the current orientation and translation, using only the current scan as query and a database of map scans. Inspired by the definition of a recognized place, we consider that a good global localization solution should keep the pose estimation accuracy with a lower place density. Following this idea, we propose a novel framework towards sparse place-based global localization, which utilizes a unified and learning-free representation, Radon sinogram (RING), for all sub-tasks. Based on the theoretical derivation, a translation invariant descriptor and an orientation invariant metric are proposed for place recognition, achieving certifiable robustness against arbitrary orientation and large translation between query and map scan. In addition, we also utilize the property of RING to propose a global convergent solver for both orientation and translation estimation, arriving at global localization. Evaluation of the proposed RING based framework validates the feasibility and demonstrates a superior performance even under a lower place density.